2022 Innovation Camp Coding Challenge

Table of Contents

  • Introduction
  • Instructions
    1. Load the data
    2. Clean the data
    3. Create a plot
    4. Do some analysis
    5. Create a new plot
    6. Prepare a report
  • Further Directions

Introduction

Rmarkdown files consist of blocks or chunks of code written in R and text written in markdown. You can run the code chunk by chunk or by knitting the entire document at once.

In the following chunk we load packages we will need and set preferences for knitting the document. Anything behind a “#” symbol is “commented code” and will be ignored by the compiler.

Instructions

1. Load the data

Today we will be working with Stocks of specified dairy products. This data is stored in a CODR (Census something something??) table

Let’s get to know the data a bit better. Let’s checkout the columns and the range of values.

##    REF_DATE             GEO               DGUID               UOM           
##  Length:37812       Length:37812       Length:37812       Length:37812      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     UOM_ID          SCALAR_FACTOR       SCALAR_ID            VECTOR         
##  Length:37812       Length:37812       Length:37812       Length:37812      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   COORDINATE            VALUE          STATUS             SYMBOL         
##  Length:37812       Min.   :    0   Length:37812       Length:37812      
##  Class :character   1st Qu.:  527   Class :character   Class :character  
##  Mode  :character   Median : 2219   Mode  :character   Mode  :character  
##                     Mean   : 6979                                        
##                     3rd Qu.: 9419                                        
##                     Max.   :74242                                        
##                     NA's   :11588                                        
##   TERMINATED          DECIMALS            GeoUID          Hierarchy for GEO 
##  Length:37812       Length:37812       Length:37812       Length:37812      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Classification Code for Stocks Hierarchy for Stocks
##  Length:37812                   Length:37812        
##  Class :character               Class :character    
##  Mode  :character               Mode  :character    
##                                                     
##                                                     
##                                                     
##                                                     
##  Classification Code for Commodity Hierarchy for Commodity    val_norm    
##  Length:37812                      Length:37812            Min.   :    0  
##  Class :character                  Class :character        1st Qu.:  527  
##  Mode  :character                  Mode  :character        Median : 2219  
##                                                            Mean   : 6979  
##                                                            3rd Qu.: 9419  
##                                                            Max.   :74242  
##                                                            NA's   :11588  
##       Date                                           Stocks     
##  Min.   :1970-01-01   Total stocks                      :19560  
##  1st Qu.:1995-04-01   Manufactures and government stocks:11070  
##  Median :2004-08-01   Retail and wholesale stocks       : 7182  
##  Mean   :2002-12-05                                             
##  3rd Qu.:2013-07-01                                             
##  Max.   :2022-06-01                                             
##                                                                 
##              Commodity    
##  Creamery butter  :11100  
##  Cheddar cheese   :10638  
##  Variety cheese   : 6234  
##  Whey butter      : 1314  
##  Process cheese   : 1314  
##  Whole milk powder:  816  
##  (Other)          : 6396

What are some things you notice about the data?

2. Clean the data

In short, “cleaning data” means to prepare it for analysis. Removing empty values, converting values to the same format or otherwise manipulating your data to improve the quality and uniformity are all examples of cleaning your data.

How might we need to clean this data?

Now let’s drop some columns we aren’t really interested in. Using the above as an example, drop “SCALAR_FACTOR”, “SCALAR_ID”, and “DECIMALS”. Note that there are many different equivalent ways to drop columns.

3. Create a plot

We’re now a bit more familiar with our data, but it can be difficult to parse from a table! Let’s create a plot so we can get a better idea of what’s going on. What are some things we might want to find out and what is the best way to visualize them?

Since we have date information, it makes sense to make a time series plot! To keep things simple, let’s focus on creamery butter in Canada over time.

Using the code above, make a new plot showing the stocks of another dairy product in another region over time. Be sure to update the title as appropriate.

We might also be curious about the breakdown of type of dairy product stocks. Let’s visualize this in a pie chart. To keep it simple, let’s focus on Canada and the most recent data, so June 2022.

Make another pie chart for another time period.

4. Do some analysis

Next we need to transform our data a bit further. Let’s compute the most numerous commodity

5. Create a new plot

6. Prepare a report

Write a short summary of what you have learned. You may wish to include some plots!

Further Directions

  1. Optimization: Data.table is a structure that is faster than the built-in data.frame structure. Can you rewrite the code to make use of this structure instead?

  2. Collaboration: You could put your code on Gitlab and have your colleagues provide feedback or even contribute to your code.

  3. Interactivity: RShiny is a tool to create interactive dashboard etc…